Enhancing a Genetic Algorithm with a Solution Archive to Reconstruct Cross Cut Shredded Text Documents
نویسندگان
چکیده
In this work the concept of a trie-based complete solution archive in combination with a genetic algorithm is applied to the Reconstruction of Cross-Cut Shredded Text Documents (RCCSTD) problem. This archive is able to detect and subsequently convert duplicates into new yet unvisited solutions. Cross-cut shredded documents are documents that are cut into rectangular pieces of equal size and shape. The reconstruction of documents can be of high interest in forensic science. Two types of tries are compared as underlying data structure, an indexed trie and a linked trie. Experiments indicate that the latter needs considerably less memory without affecting the run-time. While the archiveenhanced genetic algorithm yields better results for runs with a fixed number of iterations, advantages diminish due to the additional overhead when considering run-time.
منابع مشابه
A Memetic Algorithm for Reconstructing Cross-Cut Shredded Text Documents
The reconstruction of destroyed paper documents became of more interest during the last years. On the one hand it (often) occurs that documents are destroyed by mistake while on the other hand this type of application is relevant in the fields of forensics and archeology, e.g., for evidence or restoring ancient documents. Within this paper, we present a new approach for restoring cross-cut shre...
متن کاملReconstructing Cross Cut Shredded Documents with a Genetic Algorithm with Solution Archive
The reconstruction of shredded documents is of high interest not only in forensic science but also when documents are destroyed unintentionally. Reconstructing cross-cut shredded documents (RCCSTD) is particularly difficult since the documents are cut into rectangular pieces of equal size. Since shape information along the edges—in contrast to hand torn pieces—cannot be exploited, the reconstru...
متن کاملSemi-Automatic Reconstruction of Cross-Cut Shredded Documents
We propose a new approach for cross-cut shredded document reconstruction and evaluate it on the DARPA Shredder Challenge dataset. We begin by pre-processing chads. A set of costs based on shape (gaps, overlaps, edge similarity), graphical content (ruling line alignment, text line alignment), and semantic content (character and letter combinations) is calculated and used to rank putative chad ma...
متن کاملAn alternative clustering approach for reconstructing cross cut shredded text documents
In this paper, we propose a clustering approach for solving the problem of reconstructing cross-cut shredded documents. This problem is important in the field of forensic science. Unlike other clustering approaches which are applied as a preprocessing step before the actual reconstruction algorithms, our clustering approach is part of the reconstruction process itself. We define a new cost func...
متن کاملCombining Forces to Reconstruct Strip Shredded Text Documents
In this work, we focus on the reconstruction of strip shredded text documents (RSSTD) which is of great interest in investigative sciences and forensics. After presenting a formal model for RSSTD, we suggest two solution approaches: On the one hand, RSSTD can be reformulated as a (standard) traveling salesman problem and solved by well-known algorithms such as the chained Lin Kernighan heuristi...
متن کامل